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Abstract 

We propose to model multivariate volatility processes based on the newly defined condition- 
ally uncorrelated components (CUCs). This model represents a parsimonious representation 
for matrix- valued processes. It is flexible in the sense that we may fit each CUC with any 
appropriate univariate volatility model. Computationally it splits one high-dimensional opti- 
mization problem into several lower-dimensional subproblems. Consistency for the estimated 
CUCs has been established. A bootstrap test is proposed for testing the existence of CUCs. 
The proposed methodology is illustrated with both simulated and real data sets. 

Key words: dimension reduction, extended GARCH(1,1), financial returns, multivariate volatility, 
portfolio volatility, time series. 



*Partially supported by an EPSRC research grant and by NSF grant DMS-0355179. 

1 



1 Introduction 



One of the most prolific areas of research in the financial econometrics literature in last two 
decades is to model time-varying volatility of financial returns. Many statistical models, most 
designed for univariate data, have been proposed for this purpose. From the practical point of 
view, there are at least two incentives to model several financial returns jointly. First, time- 
varying correlations among different securities are important and useful information for portfolio 
optimization, asset pricing and risk management. Secondly, modelling for single security may be 
improved by incorporating the relevant information in other securities. The quest for modelling 
multivariate processes, which are often represented by conditional covariance matrices, has mo- 
tivated the attempts to extending univariate volatility models to multivariate cases, aiming for 
practical and/or statistical effectiveness. We list some of the endeavors below. 
Let {X^} be a vector- valued (return) time series with 

S(X t |^ t _i) = 0, Var(Xt|^i_i) = S t = (a t ,ij), 

where Tt is the cr-algebra generated by {Xj, Xt_i, • • • }, and Ht is an Ft—l -measurable dx d semi- 
positive definite matrix. One of the most general multivariate GARCH(j>, q) model is the BEKK 

representation (Engle and Kroner 1995) 

p m g m 

S * = C + E E AyX^X^AT. + E E B « S < < n lr ( L1 ) 

8=1 j=l i=l j=l 

where C, Ay, By are dx d matrices, and C is positive definite (denoted as C > 0). Although the 
form of the above model is quite general especially when m is reasonably large (Proposition 2.2 
of Engle and Kroner 1995), it suffers from the problems of overparametrization. Similar to 
multivariate ARMA models, not all parameters in model (jl.lj) are necessarily identifiable even 
when m = 1. Overparametrization will also lead to a flat likelihood function, making statistical 
inference intrinsically difficult and computationally troublesome. See, for example, Engle and 
Kroner (1995), and Jerez, Casals and Sotoca (2001). 

To overcome the difficulties due to overparametrization, a dynamic conditional correlation 
(DCC) model (Engle 2002, Engle and Sheppard 2001) has been proposed. It is based on the 
decomposition 

£ t = D t R t Dt, (1.2) 
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where Dj = diag(<7 t n , • • • , a t ',,), at,u is the conditional variance of the i-th component of Xt, 
and Rt = (pt,ij) is the conditional correlation matrix. A simple way to facilitate such a model is 
to model each cr^u with a univariate volatility model and to model conditional correlation using 
a rolling exponential smoothing as follows 

t-i t-i t-i jyj 

k=l k=l k=l 

1 /2 

where £tj = Xu/a t ii . Even with such a simple specification, estimation typically involves solving 
a high-dimensional optimization problem as, for example, the Gaussian likelihood function cannot 
be factorized into several lower-dimensional functions. To overcome the computational difficulty, 
Engle (2002) proposes a two-step estimation procedure as follows: first fit each at,u in (|1.2|) with 
a univariate GARCH(1,1) model using the observations on the i-th component of X t only, and 
then model the conditional correlation matrix Ht by a simple G ARCH (1,1) form 

R t = S(l - 9 1 - 9 2 ) + ^(et-ie^O + 2 R«-i, (1.3) 

and Et is a d X 1 vector of the standardized residuals obtained in the separate G ARCH (1,1) 
fittings for the d components of X^, and S is the sample correlation matrix of Xj. Note there 
are only two unknown parameters 61,62 in the dynamical correlation model p.3|l . so it can be 
easily implemented even for large or very large d. However it may not provide adequate fitting 
when the components of Xj exhibit different dynamic correlation structures; see an example of 
three-dimensional data set in section 4 below. Furthermore in modelling the volatility for each 
component, no attempts are made to extract additional information from other components. 

Alexander (2001) proposes an orthogonal GARCH model which fits each principal component 
(PC) with a univariate GARCH model separately, and treats all PCs as conditionally uncorrelated 
random variables. Since PCs are only unconditionally uncorrelated, such a misspecification may 
lead to non-negligible errors in the fitting; see, for example, Figure 5 and related discussions in 
section 4 below. 

Other multivariate volatility models include, for example, vectorized multivariate GARCH 
models of Bollerslev, Engle and Wooldridge (1988), constant conditional correlation multivariate 
GARCH models of Bollerslev (1990), a multivariate stochastic volatility model of Harvey, Ruiz 
and Shephard (1994), a generalized orthogonal GARCH models of van der Weide (2002), an 
easy-to-fit ad hoc approach of Wang and Yao (2005); see also a survey in Bauwens, Laurent and 
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Rombouts (2003) and the references within. 

While all the aforementioned models have their own merits, each of them has one or more 
of the three drawbacks; (i) overparametrization, (ii) computational complication, and (hi) too 
simple to catch some important dynamical structures. 

In this paper, we propose a new modelling methodology which mitigates the above three 
drawbacks. The basic idea is to assume that Xf is a linear combination of a set of conditionally 
uncorrelated components (CUCs); see section 2.1 below. One fundamental difference from the 
orthogonal GARCH model is that we use CUCs, instead of PCs, which are genuinely conditionally 
uncorrelated. The advantages of the new approach include: (i) the CUC decomposition leads to 
a parsimonious representation for multivariate volatility (matrix- valued) processes — there is no 
model identification problems, (ii) it has the flexibility to model each CUC with any appropriate 
univariate volatility models, (iii) computationally it splits a high-dimensional optimization prob- 
lem into several lower-dimensional subproblems, and (iv) it allows the volatility model for one 
CUC to depend on the lagged value of the other CUCs. 

The idea of using CUCs is similar to the so-called the independent component analysis 
(Hyvarinen, Karhunen and Oja 2001). However instead of requiring all the component series 
are independent with each other, we only impose a weaker condition that the component series 
are conditionally uncorrelated; see (|2,lj) below. Of course the existence of CUCs is also not al- 
ways guaranteed. We propose a bootstrap test to assess the feasibility of such an approach. Our 
empirical experience shows that for a large number of practical examples, there is no significant 
evidence to reject the hypothesis that the CUCs exist. 

Literature on applying independent components analysis to financial and economic time series 
includes, for example, Back and Weigend (1997), Kiviluoto and Oja (1998), Malaroiu, Kiviluoto 
and Oja (2000), and van der Weide (2002). Although our basic idea is somehow similar to van 
der Weide (2002), our approach is completely different. 

The rest of the paper is organized as follows. Section 2 contains a detailed description of the 
proposed new methodology and the associated theoretical results. Simulation results are reported 
in section 3. Illustrations with real data examples are presented in section 4. Technical proofs are 
relegated in appendices. 
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2 Methodology 



2.1 Basic setting 

To simplify the matter concerned, we may assume Var(X t ) = I<f — the d x d identity matrix. In 
practice, this amounts to replacing Xt by S -1 / 2 Xt, where S is the sample covariance matrix of 
Xt. We assume that each component of X^ is a linear combination of d conditionally uncorrelated 
components (CUCs) Ztx, • • • , Ztd which satisfy the conditions E{Z t i\J : t-i) = 0, Var(Z t j) = 1, and 

E{Z ti Z t3 \T t -x) = 0, for all % ^ j. (2.1) 
Put Z t = (Ztx, • • • , Z t d) T . The above setting implies that 

X t = AZ t , Zt = A T X t , (2.2) 

for a constant matrix A. Furthermore, Var(Zt) = A T Var(X^)A = A r A = 1^. Hence A is a d x d 
orthogonal matrix with ~(d — 1) free elements. Put 

Var(Z t |^ t _i) = diag(4, • • • , a 2 td ), (2.3) 

i.e. = Var(Z 4 j|J r t _i). It is easy to see that once we have specified cr?- - the volatility of the j-th 

CUC, for j = 1, • • • , d, volatilities for any portfolios can be deduced accordingly. For example, 

for any portfolios £t = bJX^ and ijt = b^X^ it holds that 

d d 
Var(&|.F t _i) = ^2 b h a tji Cov(&, rfr\Ft-i) = ^ bjib j2 of,,-. 

3=1 3=1 

where , • • • ,bdj) = bJA (j = 1,2). Hence, the CUC decomposition (j2.2|) facilitates a par- 
simonious modelling for d-dimensional multivariate volatility process via d univariate volatility 
models. In this way, we reduce the number of parameters involved substantially. 

2.2 Estimation of CUCs 
2.2.1 Estimation procedure 

By (j2.2j) . Ztj = aJXj, and ai,- ■ , a^ are d orthogonal vectors. The goal is to estimate the 
orthogonal matrix A = (ai,--- , a^). Note the order of ax,-- - , a^ is arbitrary, and cannot be 
identified. Furthermore, aj can be replaced by — aj. 
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Condition (|2.1j) is equivalent to 

Taax\E{Z u Z tj I(B)}\ =0 (2.4) 

for any 7r-class Bt C J~t~i such that the u-algebra generated by Bt is equal to Tt—\ (Theorem 7.1.1 
of Chow and Teicher, 1997). In practice, we use some simple Bt for the sake of the tractability. 
This leads to choosing an orthogonal matrix A = (ai, • • • , a.d) T which minimizes 

v]/ n (A) = V sup a[{ V X t X£J(X t _ fc € B)} aj , (2.5) 

where ,6 is a collection of subsets in "3i d , k$ > 1 is a prescribed integer. We denote by A = 
(ai, • • • ,&d) T the resulting estimator. 

Since the order of ai , • • • , a^ is arbitrary, we measure the estimation error by 

1 d 

D(A, A) = 1 - - V max |a[a,-|. (2.6) 

i=l 

Note that for any orthogonal matrices A and B, D(A, B) > 0. Furthermore, if the columns of A 
are obtained from a permutation of the columns of B or their reflections, D(A, B) = 0. In fact 
*n(A) = * n (B) if and only if D(A, B) = 0. 

In practice, we may let B consist of balls with an appropriately selected radius (such that 
each ball contains sufficiently many data points) centered on a grid in the sample space of Xj. 
For example, we may use those observations Xf as the centres of balls such as at least one of 
the components of Xf is the 10th, the 20th, • • • the 90th sample percentile of the corresponding 
component observations. 

To overcome the difficulties in handling the constraint A T A = 1^ in solving the above opti- 
mization problem, we reparametrize A in terms of the decompositions: 

A= H Eijfaij), (2.7) 

l<i<j<d 

where Ey(y>y) is obtained from the identity matrix 1^ with the following replacements: both the 
(i,i)-th and the (j,j)-th elements are replaced by cosyjjj, the (i,j)-th and the (j, z)-th elements 
are replaced, respectively, by sin 93^ and — sin^j (Vilenkin 1968, van der Weide 2002). Obviously 
~Eij((fij) is an orthogonal matrix, so is A given in (|2.7|) . Writing A in (|2.2I) in the form of (|2.7[) . 
the constrained minimization of l|2.5jl over orthogonal A is transformed to an unconstrained 
minimization problem over a d<yd 2 x 1 vector ip = (^12, " ' > fid, ■ • • ,fd-i,d) T - This 



minimization problem is typically solved by iterative algorithms. We stop the iteration when 
-D(Afc, Afc+i) is smaller than a prescribed small number, where denotes the value of A in the 
k-th iteration, and D is defined as in (|2.6[) . 

Remark 1. In practice, we may replace (|2.5|) by a weighted version 

*n A = > sup a[<^ =± — — m ; , 

where eo is a small constant guarding against zero denominator. This puts more emphasis on 
small sets B. Furthermore, the superemum over k in (|2,5j) may be replaced the summation over k. 

2.2.2 Asymptotic properties 

We first introduce two concepts: mixing which measures the decaying speed of the auto-dependence 
for a time series over an increasing time span, and the Vapnik-Cervonenkis (or VC) index which 
measures the complexity of a collection of sets. 

Let T\ be the cr-algebra generated by {Xj,i < t < j}. The /3-mixing coefficients is defined as 



P{n)=E{ sup \P(B)-P(B\T oo )\\. 

(See §2.6.1 of Fan and Yao, 2003.) 

For an arbitrary set of n points {x±, ■ ■ ■ ,x n }, there are 2 n possible subsets. Say that B picks 
out a certain subset from {xi, • • • , x n } if this can be formed as a set of the form B n {x\, • • • , x n } 
for a set B in B. The collection B shatters {x\, ■ ■ ■ ,x n } if each of its 2 n subsets can be picked 
out by B. The VC-index of B refers to the smallest n for which no set of size n is shattered by 
B. A collection of sets B is called a VC-class if its VC-index is finite. The collections of sets of 
rectangles, balls and their unions are VC-classes. See Chapter 2.6 of van der Vaart and Wellner 
(1996) for further discussion on VC-classes. 

Under the regularity conditions listed below, the estimator A is consistent; see Theorem 1. 
Its proof is relegated in Appendix A. 

(Al) The collection B of sets in 0l d is a VC-class. 

(A2) The process {X^} is strictly stationary with 22||X(|| 2 < oo, where || -|| denotes the 
Euclidean norm. Furthermore, the /3-mixing coefficients {Xj} satisfy (3(n) = 0(n~ b ) 
for some b > 0. 
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(A3) There exists a d x d orthogonal matrix Ao which minimises 

tf(A) = V sup |^{a[X t Xja,-/(X t _ fc €S)}|. 

Furthermore the minimum value of \& is obtained at an orthogonal matrix A if and 
only if D(A, A ) = 0. 

(A4). £'||Xt|| 2p < oo for some p > 2 and the /3-mixing coefficient in (A2) holds for 
b > p/(p - 2). 

(A5) ^(Ao) — ^(A) < —aD(A, Ao) for any orthogonal matrix A such that D(A, Ao) 
is smaller than a small but fixed constant, where a > is a constant. 

Remark 2. Let Ti be the set consisting of all d x d orthogonal matrices. Then Ti may be 
partitioned into the equivalent classes defined by the distance D in (|2.6|) as follows: the D- 
distance between any two elements within an equivalent class is 0, and the D-distance between 
any two elements from different classes is greater than 0. Let be the quotient space Ti/D 
consisting of those equivalent classes in Ti, i.e. we treat A and B as the same element in Tip if 
and only if Z?(A,B) = 0. Condition (A3) ensures Ao is the unique minimiser of ^(A) on Tin- 
In fact both and ^ n {') are Lipschitz continuous on Tip with D-distance; see Lemma 1 in 
Appendix A below. 

Theorem 1. Let ko > 1 be a fixed integer. Under conditions (Al)-(A3), D(A,Aq) — * almost 
surely as n — * oo. If, in addition, condition (A4) holds, then 

^ n (A) - ^(A) = Op(n -1 / 2 ), for any orthogonal A. 

Furthermore, n 1//2 D(A, Aq) = Op(l) provided that, in addition, condition (A5) also holds. 

When the CUCs exist, namely \£(Ao) = 0, Ao corresponds to the transform for the CUCs. 
When the CUC does not exist, Theorem 1 continues to hold. In this case, ^(Ao) 7^ and indeed 
Ao can depend on the 7r-class B. In practice, we really do not know whether this condition holds 
or not. In that case, our aim becomes naturally to find an orthogonal transform such that the 
resulting components are as less conditionally correlated as possible. Observe that the conditional 
correlation criterion 

*(A)= V sup |Corr(aTX tj aJ'X t |Xt_ fc €S)|P(X t _ fc eB). 
i<7^< d i<*<*o,-Befl 
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Thus, a reasonable criterion is to find an orthogonal transform A to minimize \£(A). The following 
theorem shows that our estimation method possesses some degrees of robustness and is better than 
the principal component transform in terms of minimizing the conditional correlation criterion 
¥(A). 

Theorem 2. Let ko > 1 be a fixed integer. Under conditions (Al), (A2), for any other orthogonal 
transform B, we have 

liminf{#(A) - tf(B)} < 0. 

Theorem 2 shows for any other orthogonal transform B, asymptotically, the transformed 
components have higher conditional correlation, in terms of V&('), than those transformed by A. 

2.3 Modelling volatilities for CUCs 

Once the CUCs have been identified, we may fit each with any appropriate univariate volatil- 
ity model, for example, a GARCH model, a stochastic volatility model, or any nonparametric 
and semiparametric volatility models. As a simple illustration, we establish below an extended 
GARCH (1,1) model for each of o\ given in l(2~3|) . 

2.3.1 Extended GARCH(1,1) models 

We assume, for the j-th CUC, j = 1, • • ■ , d, 

d 

Z t j = <T t j£tj, Otj = lj + ^2 a Ji Z t-l,i + ( 2 - 8 ) 

1=1 

where {£«, — oo < t < oo} is a sequence of i.i.d. random variables with mean and variance 
1, £tj is independent of Tt-i, 7j > and aj,otji,(3j > 0. This model contains extra d—1 terms 
Yli^j a jiZt-i i from the standard GARCH(1,1) model, which incorporates the possible association 
between the j'-th CUC and the other CUCs, while the conditional zero-correlation condition (|2.1|) 
still holds. Such a dependence is termed as that the i-th component (if otji ^ 0) is causal in 
variance to the j-th component (Engle, Ito and Lin 1991). 

In practice, we expect that cr^ may depend on Z^_^ i only for a small number of i's, including 
i = j, i.e. many coefficients ctji (for i ^ j) may be 0. Section 2.3.3 below outlines a data-analytic 
approach for building such a component-dependent model. 
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When 13 j G [0,1), (jUJ implies 

d oo 

4-=Var(Z ti |^_ a ) = Y^ + E^E^i^-M- ( 2 - 9 ) 

^ i=i fc=i 

Put Z( = (Zd, • • • , Z t d) T ■ Theorem 2 below gives a sufficient condition of the existence of sta- 
tionary solution to model (|2,8|) . 

Theorem 3. (i) The extended GARCH(1,1) model (|2.8j) defines a unique d-dimensional strictly 
stationary process {Z t } with i?| |Z t | | 2 < oo under the condition 

r ■ maxjaji, • • • , ajd} + f3j < 1, 1 < j < (2.10) 

where r = maxi<j<d (ij, and dj is the number of non- vanishing coefficients among ayi, • • • , ctjd. 
(ii) Under condition (jSHUJ), E(Z^) = 1 for all 1 < j < d if and only if 

d 

7, = 1 - & - E 1 < J < d- (2-11) 

The proof of the above theorem is in Appendix B. When ocji = for all i ^ j, i.e. each Ztj 
follows a standard GARCH(1,1) model, (|2,10|) reduces to ctjj + j3j < 1, which is the necessary and 
sufficient condition for the existence of unique strictly stationary solution with finite second mo- 
ments for the corresponding GARCH(1,1) model; see Chen and An (1998). In practice condition 
H2.10|) may often be violated, indicating the likely inappropriateness of GARCH specification for 
afj . However if we view the right hand side of (|2.9f) as an approximation for o£ , such an approx- 
imation process is strictly stationary under a weaker condition /?,• < 1. For further discussion of 
the approximation point of view, we refer to Penzer, Wang and Yao (2004). 

2.3.2 quasi-MLE 

To facilitate a likelihood, let us assume hypothetically that £j,- in (|2.8[) has a density /(•), which 

can be the standard normal distribution, generalized Gaussian distribution and t-distribution. 

The implied (negative) log-likelihood function for 6j = (ayi, • • • ,ctjd,(3j) T is 

n 

h(0j)= E {log^^O-log/^M,^))}, (2.12) 

t=u+l 
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for a given integer u>l, where a t j{Qj) 2 = Var(Zj|jF i _ 1 ) is given by (|2.8[) . By (|2.9|) and (|2.11|) . 

d oo 
^ i=l fc=l 

= i-^fz^ + E^Er 1 ^- ( 2 - 13 ) 

p i i=i k=i 

This form of atj(Oj) 2 ensures Vav(Ztj) = 1; see Theorem 2(h). The quasi-maximum likelihood 
estimator 0j minimizes (|2.12j) . In practice, we let Zt% = for all t < on the right hand side of 
(t2~m 

2.3.3 Selection of casual components 

To obtain a parsimonious representation for o\p we may select only those significant Zt—\,i on 

the RHS of the second equation in (|2.8|) . This is particularly important when the number of 

components d is large. It may be achieved by using the ideas for variable selection in regression 

analysis. Below we outline such an algorithm based on a combination of the stepwise addition 

method and the BIC criterion. 

We start with the standard GARCH(1,1) model (i.e. ayj 7^ and a.j\ = for j ^ i). We then 

add one more Zt-\% each time which maximizes the (quasi-)likelihood. More precisely, suppose 

the model contains (k — 1) terms Zt_i j x , ■ ■ ■ ,Zt-±j k _ 1 already. We choose an additional term 

Zt—i £ among £ £ ■ ■ ■ , jk—i} which maximizes the quasi-likelihood function. Note that this 

is a two-step maximization problem: For each given £ £ ■ ■ ■ , Jfc-i}, we compute the qMLE 

6j for 0j = (ajj,ajj 1 ,- ■ ■ ,a>ji,/3j) T with the constraints a>ji = 0, for i >jk-l,@}- 

~(fc) 

We then choose an £ {j,ji, ■ ■ ■ , jk-l} to minimize lj(0j ), and denote by lj(k) the minimum 
value and the index of the selected variable jk ■ Put 

BlCj(k) = lj(k) + (k + 2) log(n - v). 

We choose rj which minimizes BICj(/c) over < k < d. Note that k = corresponds the standard 
G ARCH (1,1) fitting for Z tj . 

2.3.4 LADE 

If CUCs Ztj are known (i.e. a^ are known), the asymptotic properties of qMLE may be derived 
in the similar manner as Hall and Yao (2003). See also Mikosch and Straumann (2004). For 
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example, the estimator Oj would suffer from complicated asymptotic distributions and slow con- 
vergence rates if etj is heavy-tailed in the sense that £?(|ey| ) = oo. On the other hand, a least 
absolute deviation estimator based on a log-transformation is always asymptotically normal with 
the standard root-n convergence rate; see Peng and Yao (2003). 

To construct the LADE with the constraint Var(Zjj) = 1, we write Etj = v§etj in the first 
equation in (|2.8|) . where the median of is equal to 1 and vq = l/STD(e^). With a t j(0j) 2 
expressed in (|2.13j) . parameters Oj and vq are (jointly) identifiable. Now 

log 4 - log{^-(0,) 2 } - logv 2 = log(e 2 ,). 

Since the median of log(e 2 j) is 0, the true values of the parameters minimise 

E\ log Z 2 - \og{a tj {0j) 2 } - \ogvl\. 

Therefore we may estimate the parameters by minimizing 

rt 

I log 4 - logK(^) 2 } - log^|, (2.14) 

t=i/+l 

where atj{0j) 2 is given in (|2.13|) . with the part of aji = for the non-casual component in the 
variance. So far Oj and vo are treated as free parameters. The estimators obtained are root-n 
consistent. 

To make an explicit use of the condition that Var(ey) = 1, we may estimate parameters Oj as 
follows. With the initial estimate 0^ , let vq be the reciprocal of the sample standard deviation 
of the residuals {etj}, where £y = Z t j/{atj(0^)}. With the given vo and 0^ , we can minimize 

n 

w t {\ogZl - \og{a t] {0j) 2 } - log^ 2 ) 2 , 

t=v+l 

where wt = | log Z 2 - — log{atj(0j ^) 2 } — log?5g| 1 . We may update vq and iterate further until the 
estimated Oj converges. Note that we have used a weighted L2 loss function to approximate the 
L\ loss to expedite the computation. 

2.4 Inference based on bootstrapping 

A natural question for the proposed approach is if the CUCs Zti,--- ,Z t( [ exist, although the 
minimiser {Sj} of (|2.5|) always exists. To address this issue statistically, we may construct a test 
for the null hypothesis 

H : Xf = AZ t and Z t = diag(cr tl , • • • , a td )e t , 
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where A T A = 1^, £t = (eti, • • • , £td) T , {^ti}, ■ • • , {^td} are d independent series, and each of them 
is a sequence of i.i.d. r.v.s with mean and variance 1. Note that the null hypothesis above is a 
sufficient but not necessary condition for the existence of CUCs. The independence condition is 
required to construct a bootstrap test for this null hypothesis. 

Note when Zu and Ztj are not conditionally uncorrelated, the left hand side of (|2.4|) is equal 
to positive constant instead of 0. Therefore, the large values of \l/ n (A) will indicate that the 
CUCs do not exist. We adopt a bootstrap method below to assess how large is large enough to 
reject Hq. 

If the null hypothesis Hq could not be rejected, we may also construct confidence sets for the 
coefficients (i.e. the columns of A) of the CUCs, and the parameters 6j based on the same 
bootstrap scheme. Formally confidence sets for 6j could be constructed based on asymptotic 
distributions of, for example, the LADE Oj, which may be derived in the similar manner of Peng 
and Yao (2003). However such an approach is based on the assumption that the CUCs are known 
(i.e. the vectors a.j are known), and, therefore, fails to take into account of the errors due to the 
estimation for aj. 

Let A = (ai, • • • , a^) be the estimator derived from minimizing (|2.5[) . Let Ztj = SJX^. Let 
9j be an estimator for 6j, such as the LADE defined in section 2.3.4. 
The bootstrap sampling scheme consists of the three steps below. 

(i) For j = 1, • • • ,d, draw e£-, for — oo < t < n, by sampling randomly with re- 
placement from the standardized residuals {e„_|_i.j, • • • ,£ n j} which are obtained from 
standardizing the raw residuals 

Z tj /a tj (dj), t = v + !,-■■ ,n. 

(ii) For j = 1, • • • , d, draw Z t * = &tj £ tji for — oo < t < n, where 

d d 
i=l i=\ 

(iii) Let X.* t = A(Z* V • • • , Z* d Y for t = 1, • • • , n. 

A test for the existence of the CUCs: Let ^*(A) be defined as in (|2.5|) with {X^} replaced by 
{Xj?}, and the bootstrap estimator A* = (a*,-- - ,a^) be computed in the same manner as A 
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with {X^} replaced by {X£}. Note that the bootstrap sample {Xj?} is drawn from the model 
with SjXf as its genuine CUCs. Hence the conditional distribution of (A*) (given the original 
sample {X^}) may be taken as an approximation for the distribution of ^ n (A) under Hq. Thus 
we reject Hq if ^ n (A) is greater than the [.Baj-th largest value of \&* (A*) in a replication of the 
above bootstrap resampling for B times, where a £ (0, 1) is the size of the test and B is a large 
integer. 

Confidence sets for A: A bootstrap approximation for an (1 — a) confidence set of the transfor- 
mation matrix A can be constructed as 

{A|D(A;A) <c a ,A T A = I d }, (2.15) 

where c a is the [Sa]-th largest value of D(A;A*) in a replication of bootstrap resampling for B 
times. Note that when A is in the confidence set, so is B if the columns of B form a permutation 
of the (reflected) columns of A; see (|2.6j) . 

Interval estimators for the components of Qf A bootstrap confidence interval for any component, 
say, (3j of 9j may be obtained as follows. Repeat the above bootstrap sampling B times for some 
large integer B, resulting in bootstrap estimates P* x , • • • , (5* B . An approximate (1 — a) confidence 
interval for (3j is n, /^*(f, 2 ))i where f3*^ denotes the i-th smallest value among (3^, ■ ■ ■ ,f3* B , 
and h = [Ba/2] and b 2 = [B(l - a/2)]. 

3 Simulation 

We conduct a Monte Carlo experiment to illustrate the proposed CUC-approach. In particular 
we check the accuracy of the estimation for the transformation matrix A in (|2.2j) . 
We consider a CUC-GARCH(1,1) model with d = 3 

X t = AZ t , Z t \Ft-i ~ N(0, diagK 2 !,^,^}), (3.1) 

where a 2 t i = ji + otiZl_ X i + A°f-i,i> and 



A 


i 


li 




Pi 





0.500 


0.866 


1 


0.02 


0.08 


0.90 





0.866 


-0.500 


2 


0.10 


0.10 


0.80 


-1 








3 


0.28 


0.12 


0.60 
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It is easy to see that A T A = I3 and 7$ = 1 — a.{ — [3% so that the variances of the CUCs are 1 
[see (|2.11|l ]. Since a\ + [5\ = 0.98 is very close to 1, the volatility for the first CUC is highly 
persistence. On the contrary, the volatility persistence in the third component is less pronounced 
as «3 + /?3 = 0.72 only. 

For each of 200 samples with size n = 500 and 1000 respectively from the above model, we 
estimated the transformation matrix A by minimizing ty n {A) defined in (|2,5|) . which was solved 
using the proprietary optimization routines in MATLAB. Note that as far as the estimation of A 
is concerned, two orthogonal matrices are treated as identical if the D-distance between them is 
0; see (|2.6|) . The coefficients oti,(3i and 7^ were estimated using quasi-MLE based on a Gaussian 
likelihood. The resulting estimates were summarized in Table 1 and Figure 1. 



Table 1: Simu 



ation Results: summary statistics of the errors in estimation 





D(A,A) 


a 1 




S'2 


02 


S 3 


ft 




mean 


0.0753 


0.0719 


0.8701 


0.0865 


0.7506 


0.0997 


0.6189 




median 


0.0474 


0.0705 


0.8870 


0.0830 


0.7801 


0.0861 


0.6445 


n = 500 


STD 


0.0714 


0.0300 


0.0830 


0.0469 


0.1469 


0.0600 


0.2017 




bias 




-0.0081 


-0.0299 


-0.0135 


-0.0494 


-0.0203 


0.0189 




RMSE 




0.0303 


0.0888 


0.0484 


0.1546 


0.0629 


0.2022 




mean 


0.0679 


0.0722 


0.8921 


0.0846 


0.7751 


0.0937 


0.6307 




median 


0.0434 


0.0731 


0.8999 


0.0833 


0.7956 


0.0938 


0.6517 


n = 1000 


STD 


0.0648 


0.0224 


0.0400 


0.0346 


0.1065 


0.0412 


0.1634 




bias 




-0.0078 


-0.0079 


-0.0154 


-0.0249 


-0.0263 


0.0307 




RMSE 




0.0234 


0.0403 


0.0384 


0.1191 


0.0487 


0.1660 



Since both the means and the standard deviations D(A, A) are very small, the estimation for 
A is accurate. The coefficients in each CUC models were also estimated accurately. The errors 
in estimation decrease as the sample size increases from 500 to 1000. 

Since biases reported in Table 1 are always negative; see also Figure 1. This indicates that the 
coefficients in the GARCH(1, 1) models for CUCs were slightly underestimated. Also note that 
the estimation errors decrease when the volatility persistence (measured by ati + /3j) increases; see 
the upper panel of Figure 1 for the estimation with the sample size 1000. To make a comparison, 
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the estimation errors of the GARCH coefficients when the true A is used are plotted in the lower 
panel. The differences are small. 

4 Real data examples 

In this section we illustrate the proposed method with two real data sets. 

The first data set, denoted as SCI, consists of the 2275 daily log returns (in percentages) of 
S&P 500 index, stock price of Cisco System and stock price of Intel Corporation in 2 January 
1991 — 31 December 1999. This data set has been analyzed in Tsay (2001). Figure 2 depicts the 
time series plots of the three series. Descriptive statistics are listed in Table 2. Obviously, the 
unconditional distribution of all of these series exhibit excessive kurtosis; indicating significant 
departure from normal distributions. 

The Ljung-Box Q statistics suggest some plausible autocorrelation in these series. But this 
may be due to the heteroscedasticity. Hence we compute the p-values of these Q tests based 
on a bootstrap procedure: for each of the mean-deleted component return series, we first fit a 
univariate GARCH(1,1) model 

Y t = a t e t , of = a + oiiY?_i + P\o\_ x , 

and denote the estimated parameters as So, Si, (3±, respectively, and the standardized residuals as 
e t . Draw e* randomly with replacement from {e t , t = 1, ■ ■ ■ , n} and draw Y t * from 

Y t * = a t e* t , of = S + SiY^ + A°f-i- 

Let Q* be a Q-statistic based on Y t *. The p- value of Q is now estimated by the relative frequency 
of the occurrence of the event that Q* is great than Q in a repeated bootstrap sampling for 1000 
times. In Table 2, those p-values are listed in parentheses below the values of the corresponding 
Q statistics. Based on those p- values, there is no significant evidence for the existence of autocor- 
relation in all the three component series. Accordingly there is no need to fit a VAR model for 
the conditional mean for this data set. 

Let Yt be the mean-deleted returns of SCI. Let X! = PAP r be the sample covariance matrix 
of Y t , where PP T = I3 and A is diagonal. Let X t = A~^P T Y t . Then we may regard the 
(unconditional) covariance matrix of X. t is I3. 
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Table 2: Summary Statistics of the Two Real Data Sets 





o&J-r OUU 


Cisco 


mtei 


rio 


Tivr 

J In 


ori 


Qrp 


1 W 


N 


2275 


2275 


2275 


1349 


1349 


1349 


1349 


1349 


Mean 


0.0656 


0.2567 


0.1561 


-0.0198 


-0.0477 


0.0178 


-0.0081 


-0.0400 


Stdev 


0.8747 


2.8540 


2.4644 


2.1822 


1.7382 


1.5401 


1.8784 


1.9863 


Min 


-7.1140 


-22.1000 


-14.5810 


-14.7346 


-9.0145 


-8.7277 


-9.1535 


-9.9360 


Max 


4.9900 


15.5760 


12.8500 


20.2083 


8.8876 


8.8491 


19.5559 


9.7871 


Skewness 


r\ r* r\r\ 

-0.3600 


-0.3963 


-0.2353 


0.6419 


0.1375 


0.1861 


0.9114 


0.1345 


Kurtosis 


9.0469 


6.7229 


5.4701 


14.3999 


5.0891 


8.4310 


15.2063 


5.4082 


Q(10) 


22.8322 


25.3861 


6.8567 


32.2251 


8.8471 


12.9372 


28.6943 


16.9723 




(0.2440) 


(0.0870) 


(0.8180) 


(0.1760) 


(0.7540) 


(0.7770) 


(0.2180) 


(0.2540) 


Q(20) 


44.2898 


33.9490 


30.3427 


46.1651 


19.1511 


26.9255 


40.7220 


28.4664 




(0.2300) 


(0.2500) 


(0.1170) 


(0.2810) 


(0.7200) 


(0.7310) 


(0.2870) 


(0.3290) 



Note: Q(k) is referred to the Ljung-Box portmanteau test statistics. Figures in parentheses are 
their corresponding p-values based on 1000 bootstrap replications. 



Based on data X t , an estimator A was obtained with \I/ n (A) = 0.1732. Consequently a 
G ARCH (1,1) model was fitted for each CUC. The estimated coefficients are listed in Table 3 
which shows that the volatility of the first and third CUCs is highly persistent as dt\ + 0\ = 0.9925 
and S3 + = 0.9611. (One may fit the first CUC with an IGARCH model.) On the other hand, 
the volatility of the second CUC is less persistent as S2 + 0i = 0.80. 

We applied the bootstrap procedure (with 500 replications) described in section 2.4 to test the 
existence of the CUCs. The p- value is 0.60, indicating that there is no strong evidence against the 
hypothesis of the existence of CUCs. The (1 — a) bootstrap confidence set for the transformation 
matrix A is {A|D(A,A) < c a , A r A = I 3 } with c a = 0.1718 for a = 0.05, and 0.1368 for 
a = 0.1. Since Z)(A, I3) = 0.2593, I3 is not contained in the confidence sets. This indicates 
that the principal components cannot be taken as the CUCs. The confidence intervals for the 
parameters for each CUC-GARCH(1,1) models are listed in Table 3. The length of the confidence 
intervals increase as the volatility persistent measured by 2j + 0j decreases. This is consistent 
with the finding from the simulation study reported in section 3. 

Based on the fitted conditional variances for the CUCs, the conditional variance matrix 
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for the original series Y t is equal to 

where W = PA^ A. Since the volatility processes of the first and third CUC are highly persistent, 
they can be modelled with Integrated GARCH models. If so, the volatility processes for original 
series and their covariance processes are virtually modelled by mixtures of IGARCH models and 
mean-reverting GARCH models, which is similar to the Component GARCH model used in Ding 
and Granger (1996) to capture the long memory properties for a univariate volatility process. 





Table 3: 


Fitted CUC-GARCH(1,1) model for SCI 




Estimate 


95% Confidence Set 


90% Confidence Set 


ai 


(-0.5605,-0.0018 


-0.8081) T 






a 2 


(0.5693,0.7217,- 


-0.3939) r 


c .05 = 0.1718 


co.no = 0.1368 


a 3 


(0.6015,-0.6922, 


-0.3989) T 






7i 


0.0074 




(0.0042, 0.0592) 


(0.0048, 0.0449) 


CKi 


0.0519 




(0.0316, 0.0915) 


(0.0350, 0.0812) 


Pi 


0.9406 




(0.8446, 0.9576) 


(0.8740, 0.9548) 


72 


0.1997 




(0.0460, 0.7138) 


(0.0673, 0.5705) 


OL2 


0.0432 




(0.0077, 0.1054) 


(0.0107, 0.0926) 


02 


0.7572 




(0.2446, 0.9289) 


(0.3600, 0.9069) 


73 


0.0389 




(0.0200, 0.1042) 


(0.0239, 0.0870) 


«3 


0.0884 




(0.0476, 0.1305) 


(0.0517, 0.1236) 


03 


0.8727 




(0.7889, 0.9266) 


(0.8051, 0.9140) 



Figure 3 depicts the fitted volatility processes for each return series and Figure 4 displays the 
conditional correlations among the three components series. Note the volatilities of the S&P 500 
index has a much smaller scale than those of the two individual stocks. Increasing trends can be 
observed in all the three correlation processes over the last three years, which may be connected 
with the high volatilities in all the return series over the same period. But on the other hand, 
the high volatility of Cisco prices in the middle period did not lead to a high correlation with the 
other two. This suggests a unilateral impact from the market to the single stock. 
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Figure 5 displays the fitted volatility processes for the three return series based on the orthog- 
onal GARCH(1,1) model of Alexander (2001) and Ding and Engle (2001). Note that orthogonal 
GARCH model effectively treats the principal components as conditional uncorrelated variables, 
which may overlook important conditional dependence structure in the original data. Note that 
the time varying patterns in the three processes in Figure 5 are similar, which is different from 
Figure 3 of CUC-GARCH(1,1) fitted. Especially the orthogonal GARCH fitting artificially in- 
flates the volatility of S&P500 index in the middle period; see the original time plot of the series 
in Figure 2. The inflation is due to treating the conditional correlated principal components as 
CUCs. As we stated above, the identity matrix is indeed not included in the confidence set for A. 

Our second data set consists of the daily close returns of five Asian stock indices, namely, 
Hang Seng index of Hong Kong (HS), Japan Nikkei 225 index (JN), Shanghai Composite index 
of China (SC), Straits Time index of Singapore (ST) and Taiwan Weighted index (TW) in the 
period of 1 August 1997 — 30 December 2003. Adjustments are also made to account for the 
differences in the holidays of the five markets. The five return series are plotted in Figure 6, 
and the descriptive statistics are listed in Table 2. Most of the sample means of these returns 
are negative, except the mean of SC. Different from the three series in SCI, all five series are 
right-skewed over this specific period. The bootstrap p-values for the Q statistics are obtained in 
the same way as before; indicating no significant autocorrelation in all the five series. 

We fitted a CUC-extended GARCH (1,1) to the mean-deleted return series. The lagged valued 
from the other CUCs were selected using BIC together with a forward searching; see section 2.3.3. 
The fitted extended GARCH(1,1) models, based on quasi-MLE with Gaussian likelihood, for the 
five CUCs are reported in Table 4. According to the fitted models, the first CUC is causal in 
variance to the fifth CUC, the second CUC is causal in variance to the first and the third CUCs, 
and the fifth CUC is causal in variance to the first CUC. On the other hand, no additional 
variables were selected in the models for the second and fourth CUCs. 

Figure 7 displays the fitted volatility processes for the five original stock returns. As expected, 
the most volatile waves are observed at the early of 1998 with the onset of the Asian financial 
crisis, which are especially predominant in Hong Kong and Singapore markets. While the shock 
is still big, the impact of the crisis on Japan and Taiwan markets is less drastic. Furthermore, the 
effect to Shanghai market is on a much smaller scale. In Figure 8, we present the fitted conditional 
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correlation between Hong Kong and the other four markets. Obviously, the most correlated period 
is in accord with the epidemic of Asian financial crisis. After that, the correlations between Hong 
Kong and Singapore almost remain at a constant level except two downslides in the middle of 
1999 and 2002, respectively. Likewise, the correlations between Hong Kong and Taiwan are almost 
at a constant level, although a little smaller than that with Singapore market. A upward trend 
can be seen in the correlation between Hong Kong and Japan markets in the last few years, 
which suggests that these two markets were becoming more closely integrated. On the contrary, 
the correlations between Hong Kong and Shanghai markets seems to have a downward to zero 
trend in the last few years. The implications of these observations to international diversification 
deserve a further investigation. 



Table 4: Extended GARCH(l.l) for CUCs of Asian Market Data 



3 


3i 


r 




BIC 


1 


5, 2 


2 




= 0.0271 + 0.8609of_ 


. M + 0.0405Z 2 _ 


. M + 0.0637Z 2 . 


li5 +0.0117Z 2 _ li2 


3622 


2 







°h 


= 0.0521 + 0.8004of_ 


_ 1>2 + 0.1475Z 2 . 


1,2 




3602 


3 


2 


1 


°t,3 


= 0.0077 + 0.9301of_ 


_ lj3 + 0.0526Z t 2 _ 


_ 13 + 0.0098Z 2 _ 


1,2 


3731 


4 









= 0.0704 + 0.8539of_ 


_ M + 0.0757Z 2 _ 


-1,4 




3780 


5 


1 


1 




= 0.0122 + 0.8227of_ 


. 15 + 0.1530Z 2 . 


. 1S + 0.0261Z 2 _ 


1,1 


2534 



Finally we compared the fitting based on our CUC-based G ARCH (1,1) with the orthogonal 
GARCH(1,1) models and Engle's dynamic conditional correction (DCC) model (|1.2|) and (|1.3|) in 
terms of a goodness-of-fit tests based on the Ljung-Box statistic (Tse and Tsui 1999). Note the 
DCC-model for each component of Yj reduced to the standard univariate GARCH(1,1) fitting. 

^ ^1/2 ^ 

We define the standardized residual for the i-th series as u t i = Yujo tii , where dt^a is the (i, i)-th 
element of the fitted conditional variance of Y^. Define 



a 



t,ij 



u ti -l i = j 

uuutj - Pt,ij i / j, 

where paj = &t,ij 1 1 ^t,ifit,jj) 1 ^ 2 is the estimated conditional correlation between Yjj and Y t j. If 
the model is correctly specified, there is no autocorrelation in {C ti ij,t > 1} for any fixed i, j. Put 

M 



Q(ij,M)=nJ2rij, k , 



fc=i 
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where is the lag k sample autocorrelation of Ct,%j- It is intuitively clear that the large values 
of Q(ij,M) indicate the lack of fit for the conditional correlation between the i-th and j-th 
components Y< for i ^ j, and the lack of fit for the conditional variance of the i-th component 
for i = j. Although the distribution theory of Q(ij,M) is unknown, empirical evidence suggests 
that Xm provides a reasonable reference in practice; see Tse and Tsui (1999). 

Table 5 lists the values of the Q-statistics with M = 10. The significant levels were gauged 
according to the Xio~distribution. The advantage of using the CUC-GARCH model over the 
Orthogonal GARCH model is obvious as the Q-values for the former tend to be smaller, or 
significantly smaller, than those for the latter. Furthermore, all the Q values for the fitted CUC- 
GARCH models are insignificant at the level of 10%, while the test rejects some Orthogonal 
GARCH fittings at the significance level 1%. For example, the p- values for testing the correlations 
between S&P 500 and Cisco stock, and S&P 500 and Intel stock is less than 1%; indicating 
significant autocorrelation. This may explain the incomprehensible jumps in the fitted volatility 
for S&P 500 by orthogonal GARCH model in Figure 5. The same phenomena may also be observed 
in the fitting for the second data set. The orthogonal GARCH model failed to provide adequate 
fittings for Hang Seng index (HS), Singapore Straits Time index (ST) and Taiwan Weighted index 
(TW), as indicated by the large Q- values; see Table 5. 

Overall the DCC model provide a competitive performance to the CUC model for the Asian 
Markets data. This is may due to a certain degree of homogeneity among the five Asian market 
indices. For SCI consisting of one market index and two stock prices, the gain of using CUC over 
DCC is more pronounced. First, the DCC-model seems to fail to catch the dynamic correlation 
between the returns of the S&P 500 index and the Cisco stock price. Furthermore, although 
Q-value for the CUC-model for S&P 500 is marginally larger than that of the DCC model, the 
Q-values for the CUC-models for both Intel and Cisco prices are substantially smaller than those 
for the DCC models; suggesting an improvement for the modelling volatility dynamics for the 
Intel or the Cisco price by incorporating the information from other series. 

The Q-tests with different values of M lead to similar pattern as Table 5, which, therefore, 
are omitted to save the space. 
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Table 5: Specification test — Q 


(10) for cross products of standardized residuals 




SCI data 


Asian Market Data 


i,j 


O-GARCH 


DCC 


CUC-GARCH 


O-GARCH 


DCC 


CUC-GARCH 


CUC-Ex GARCH 


1 


59.9140*** 


5.9498 


6.2050 


56.7580*** 


6.0285 


11.4480 


8.6961 


2 


10.5100 


9.0587 


8.0542 


12.3540 


7.8517 


8.6713 


8.7751 


3 


2.6192 


6.4293 


2.2397 


8.5368 


9.2749 


8.5301 


8.5265 


4 








18.6100** 


2.6610 


4.0512 


3.7954 


5 








18.0610* 


7.4710 


11.7960 


13.5150 


1,2 


51.8060*** 


10.4887 


10.9090 


7.1025 


7.0622 


4.6433 


4.2671 


1,3 


77.5140*** 


20.6745** 


10.5170 


3.8940 


4.6465 


3.4987 


3.5564 


1,4 








17.2180* 


4.7943 


6.2915 


5.8084 


1,5 








9.2396 


6.1648 


5.6669 


6.3143 


2,3 


5.9453 


7.0617 


9.6275 


9.6031 


10.1762 


9.6444 


9.5912 


2,4 








6.3708 


7.7241 


3.4542 


3.2648 


2,5 








6.8629 


5.8438 


6.1856 


6.9089 


3,4 








11.9120 


8.0303 


7.3119 


5.8486 


3,5 








2.2256 


2.1565 


1.5721 


1.6857 


4,5 








5.4389 


4.7838 


3.0312 


3.1083 



Note: 1) ***, **, * indicate that the corresponding test is signihcant at the level 0.01, 0.05 and 0.1, respectively. 2) i,j in the left column 
corresponds to to the orders of component series in each data sets. For example, "1,2" stands for the cross product of the standardized 
residuals of SSzP 500 and Cisco for the SCI, and for HS and JN for the Asian market data set. 



Appendix A — Proof of Theorem 1 

We introduce some notation first. Let 

n 

C n , k (B) = (n- k)- 1 X t X[/(X t _ fe e B), C fc (fl) = £{X t X[/(X t _ fe G B)}. 
t=k+l 

The lemma below shows that both \I / (-) and ^n(') are Lipschitz continuous onTio with D-distance, 
where Hd is the quotient space; see Remark 2. 

Lemma 1. For any U, V G Hd, it holds that 

|tf(U) - tf(V)| < ctr^(X<X^){D(U,V)} 1/2 , 



|* n (U) - *„(V)| < c tr(n- x £ X*Xf ) {D(U, V)} 1 ^ 

i=l 

almost surely, where c > is a constant and tr(A) is the trace of a matrix A. 

Proof. We only prove the lemma for VP(-). The result for ^n(') may be shown in the same 
manner. Let U = (ui,--- ,u d ) T , V = (vi,--- ,v d ) T , u ijk (B) = E{uJC k (B)uj} and v ijk (B) = 
E{vJC k (B)vj}. We assume that the orders and the directions of Uj and Vj are arranged such 
that ujvi G [0, 1] for all i, and 

D(U, V) = 1 - ~ u[v, = i £(1 - u[ Vi ). (5.1) 

i=l i=l 

See (J2SJ- Put the spectral decomposition for C k (B) as 

d 

where fj,i(B,k) > ■■■ > Hd{B,k) > are the eigenvalues of C k (B), and 7 l5 --- ,7^ are their 
corresponding (orthonormal) eigenvectors. It is easy to see that fie(B,k) < fj,$ for all A; and 5, 
where /ii > • • • > //<z are the eigenvalues of the matrix i?{XfX T }. Consequently, by noticing that 
|Tl u il — 1 and I v l7£l — 1) we have 

d 

\u ijk (B) - v ljk (B)\ < ^2fj, e \ujj £ jluj - v[7^7jv i | 

1=1 

d 

< M\ u i~feYeUj - vjiajujl + Wllele^j ~ ^llill^jW 

t=\ 

d 

< ^^{|(u, - Vi ) T 7^| + |7K u i~ v i)l) 
i=\ 
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By using the Cauchy-Schwartz's inequality, the above inequality is furthered bounded by 

d 

Vi|| + ||uj - VjH} 



i=i 

d 

= V2{(l-u[v i ) 1 /2 + (1 _ u r v . ) l/2 } ^^ (52) 

e=i 

Note that for i/O, it holds that 

\x + y\ — \x\ = y sgn(x) + 2(x + y){I(—y < x < 0) — 7(0 < s < — y)}. (5.3) 

Hence, 

¥(U) = V sup [|«iifc(-B)| + |«yft(B) + - %fc(£)}| - 

i<7^< d i<fc<fco,seB 

= V] sup [|% fc (S)| + {u ijk (B) - v ijk (B)}sga{v ijk (B)} 

+ 2u ijk (B){I(B 1 )-I(B 2 )}], (5.4) 

where 

Bi = {v ijk (B) - u ijk (B) < v ljk (B) < 0}, B 2 = {0 < < v ijk (B) - u ljk (B)}. 

On the set BiL)B 2 , 

\u ijk (B)\ < \u ijk (B) -v ijk (B)\ + \v ijk (B)\ < 2\u ijk (B) -v ijk (B)\. 
This, combining with Q5.2[) and (|5.4j) . implies that 

|*(u)-*cv)| 



d 

< V sup [x/2{(l - u[v,) 1/2 + (1 - ujv,-) 1/2 } Y> + 2|« iiJfe (B)|J 1 (B 1 )] 

d 

< 5V2 {(i-urv^+a-ujv,) 1 ^}^^ 

l<i<j<d 1=1 
d d 

< lOv^^/i^l-u^) 1 /*. (5.5) 

£=1 i=l 

Now the lemma follows from (|5.5|) and the inequality 

d d 



^(l-ulv^/^^j^l-ulv,)} 7 , 



1/2 

i=l / 1 

see also (|5.1|1 . This completes the proof. 
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Proof of Theorem 1. Since C Ui k(B) — C k (B) is a real symmetric matrix, it holds for any unit 
vectors a and b that 



|a T {C n , Jfe (S) - C k (B)}b\ < \\C nik (B) - C k (B)\\, 

where \ \C n ^(B) — C k (B)\\ denotes the sum of the absolute values of the eigenvalues of C Hjk (B) — 
C k (B). This may be obtained by using the spectral decomposition of C nk (B) — C k {B). Conse- 
quently it holds uniformly for any orthogonal matrix A that 

|tf n (A)-¥(A)| < V sup \^{C nik (B)-C k (B)} aj \ 

< l) sup ||C n , fc (S)-C fc (fl)||. (5.6) 

^ i<fc<fc ,BeB 

Note the (i, j)-th element of C n>k (B) — C k (B) is 
1 - 

r XtiX t jI(yLt-k e B) — E{X t iX t jI(K t _ k £ B)}, 

ft Ki 

t=k+l 

where Xu denotes the i-th element of X(. Since E\XuXtj\ < oo and B is a VC-class, the covering 
number for the set of functions {XuXtjI(X.t-k £ B),B G B} has a polynomial rate of growth for 
any underlying probability measure (Theorem 2.6.4, van der Vaart and Wellner 1996). Hence, it 
is a Glivenko-Cantelli class. It follows now from Theorem 3.4 of Yu (1994) that 



sup 

BeB 



1 n 

— Y, XtiXtjl&t-k SB)- E{X a X tj I(X t _ k e B)} 



n — k 

t=k+l 



^0, 



Consequently, 



sup |A max (B, k)\ —4 0, sup \\ min (B, k)\ 0, 
BeB BeB 



where A max (.B, k) and A m ; n (-B, k) denote, respectively, the maximum and the minimum eigenvalues 
of C„, fc (B) - C k (B). Thus 

sup \\C n;k (B) - C k (B)\\ ^ 0, 
BeB 

for k = 1, • • ■ , /co- Now it follows from (|5.6|) that 

sup |tt„(A)-¥(A)| ^0. 
Aew D 

Combining this with Lemma 1 above and the continuity of the argmax mapping (Theorem 3.2.2 
and Corollary 3.2.3, van der Vaart and Wellner, 1996), it holds that D(A,A ) ^ 0. This 
completes the proof of the first part of Theorem 1. 
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Under the additional condition E\XtiXtj\ 2p < oo and the mixing condition given in Condition 
(A4), Theorem 1 of Arcones and Yu (1994) implies that the set of functions {XtiXtjI(X-t-k £ 
B),B £ £>} is a Donsker class, and hence the process {A. n j ( .(B), B £ £>} indexed by B £ £> 
converges weakly to a Gaussian process, where A n ^(i?) = - v /n{C n / t(i?) — C&(£?)}. It follows 
from (|5.3|) that 

^«( A ) = X. SU P [|afC fc ( J B)a j |+n- 1 / 2 sgn{a[C fc (S)a J }a[A njfc ( J B)a j 

i<i^<d BeB ^ k ^ 

+a[C„, fc ( J B)a j {/(5 3 ) - I(B 4 )}] 
= ^(K) + P {n~ 1 / 2 ), (5.7) 

where 

B 3 = {n- 1 /2 a - Anife (5)a, < a[C fc ( J B)a j < 0}, B A = {0 < a[C fe (£)a, < n- 1 / 2 a[A nife (S)a,}. 

The last equality in (|5.7j) follows from the fact that on B% U B,±, 

\a[C n>k (B) aj \ < ^[Cfc^a.-I +n- 1 / 2 |a[A nife (S)a,| < 2n- l l 2 \^ ^ nyk (B)^\. 

It follows from (|5.7|) and condition (A5) that 

VP n (A ) - * n (A) = (A ) - * (A) + P {n- l l 2 ) < -aD(A , A) + Op^ 1 / 2 ). (5.8) 

Now by substituting A by A, the left hand side of (|5.8|) must be non-negative by the definition 
of A. The right hand side of Q5.8j) would be negative unless 

D(A , A) = P (n- 1 / 2 ). 

This completes the proof. 

Appendix B — Proof of Theorem 2 

From the proof of Theorem 1, we have 

sup |^n(A) - (A)| ^ 0. (5.9) 
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Since ^(A) is continuous on the compact quotient space Ti., there exists a minimizer Ao- It follows 
that 

^(A)-^(B) = *(A ) - *(B) + *(A) - *(Ao) 
< tf(A)-tf(A ) 

= {*(A) - tt n (A)} + {* n (A) - * n (A )} + {*„(A ) - *(A )}. 
Using the fact ^ n {A) - ^ n (A ) < 0, we conclude from (|5.9|) that 

liminf{*(A) - *(B)} < 0. 
This completes the proof of Theorem 2. 

Appendix C — Proof of Theorem 3 

For each j, there are at most r non-zero Since 0j < 1, it holds that 

d oo 

Now Theorem 2 follows from Lemma 2 below immediately by letting Yfj = Xfj and ptj = cr^. 
Note that Lemma 2 may be proved in the similar manner to the proof of Theorem 1 of Giraitis 
at al (2000); see also section 2.7.1 of Fan and Yao (2003). 

Lemma 2. Consider a <i-dimensional ARCH(oo) process Yt = (Iti, • • • ,Ytd) T defined by 

d oo 

Ytj = PtjCtj, Ptj = Cj + ^ b Jik Y t-k,i 

i=l fc=l 

for j = 1, • • • , d, where {Ctj} is a sequence of non-negative i.i.d. random variables with E(Qj) = 1, 
> 0, Cj,bjik > 0. Furthermore, for each j, 7^ for at most r(> 0) values of k. Then the 
above model admits a unique strictly stationary solution {Y t } with the finite mean 

E(Y t ) = (I d -B)- 1 (c 1 ,--- ,c d ) T 

under the condition maxi<j- j<<2 bji. < 1/r, where 6™. = J^fc>i ^jifc! anc ^ B is a <i x <i matrix with 
bji. as its (j, i)-th element. 
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Figure 1: Boxplots of the errors in estimation for CUC-GARCH(1,1) model with A 

estimated (upper panel) and the true A (lower panel). The sample size is n = 1000. 
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Figure 2: Plots of daily log return of (a) S&.P 500 index, (b) Cisco Systems stock and (c) 
Intel Corporation stock. Time span is from January 2, 1991 to December 31, 1999 with 2275 
observations. 
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Figure 3: Fitted volatility processes based on CUC-GARCH(1,1) model for daily log returns of 
(a) S&iP 500 index, (b) Cisco Systems stock and (c) Intel Corporation stock. 
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(a) S&P 500 Index and Cisco 
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Figure 4: Fitted conditional correlations based on CUC-GARCH(1,1) model for daily log returns 
between (a) SSzP 500 index and Cisco Systems stock, (b) SSzP 500 index and Intel Corporation 
stock, and (c) Cisco Systems stock and Intel Corporation stock. 
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Figure 5: Fitted volatility processes based on Orthogonal-GARCH(l,l) model for daily log returns 
of (a) S&P 500 index, (b) Cisco Systems stock and (c) Intel Corporation stock. 
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Figure 6: Plots of dividend adjusted daily log returns of (a) Hang Seng index in Hong Kong, 
(b) Japan Nikkei 225 index, (c) Shanghai Composite index in China, (d) Singapore Straits Time 
index, and (e) Taiwan Weighted index. Time span is from August 1, 1997 to December 30, 2003 
with 1349 observations. 
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Figure 7: Fitted volatility processes based on CUC-Extended GARCH(1,1) model for daily log 
returns of (a ) Hang Seng index in Hong Kong, (b ) Japan Nikkei 225 index, (c ) Shanghai Composite 
index in China, (d) Singapore Straits Time index, and (e) Taiwan Weighted index. 



36 



(a) HS and JN 




600 800 
(d) HS and TW 



1400 



1400 



1000 1200 1400 




1400 



Figure 8: Fitted conditional correlations between daily log-returns of Hang Seng index (HS) and 
(a) Japan Nikkei 225 index (JN), (b) Shanghai Composite index in China (SC), (c) Singapore 
Straits Time index (ST), (d) Taiwan Weighted index (TW). 
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